Goto

Collaborating Authors

 data model



Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

Neural Information Processing Systems

Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. Despite a proliferation of proposed algorithms for this task, assessing their performance both theoretically and empirically is still very challenging. Distributional matching algorithms such as (Conditional) Domain Adversarial Networks [Ganin et al., 2016, Long et al., 2018] are popular and enjoy empirical success, but they lack formal guarantees. Other approaches such as Invariant Risk Minimization (IRM) require a prohibitively large number of training environments---linear in the dimension of the spurious feature space $d_s$---even on simple data models like the one proposed by [Rosenfeld et al., 2021]. Under a variant of this model, we show that ERM and IRM can fail to find the optimal invariant predictor with $o(d_s)$ environments. We then present an iterative feature matching algorithm that is guaranteed with high probability to find the optimal invariant predictor after seeing only $O(\log d_s)$ environments. Our results provide the first theoretical justification for distribution-matching algorithms widely used in practice under a concrete nontrivial data model.








A Closed form expressions for the robust risks

Neural Information Processing Systems

In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.


AT overview

Neural Information Processing Systems

Each row is a model, and each column is an evaluation setting. A few cells are empty due to resource constraints. As discussed in Section 4.1, multiple models trained on more data achieve positive effective robustness However, this effect is not uniform. Our experiments suggest that neither growing the number of images nor classes in an i.i.d. For one, our experiments consider only i.i.d.